Multilingual Terminology Extraction and Validation

نویسندگان

  • Antonio S. Valderrabanos
  • Alexander Belskis
  • Luis Iraola Moreno
چکیده

This paper presents the automatic terminology extraction approach developed within project LIQUID. This project aims at developing a cost-effective solution for the problem of cross-language access to multilingual text databases in technical and scientific domains. Cross-Language Information Retrieval faces a major challenge: organizing unstructured textual information according to its contents and regardless of its language. Our solution is based on two main components, a terminology extraction tool and a domain-specific ontology. The terminology extraction tool identifies the terminology that describes the contents of a particular document. Then, these terms are linked to a domain-specific ontology. This paper presents the terminology extraction tool and the experimental results obtained in the domain of Gastroenterology. 1 LIQUID is an RTD project funded by the European Commission under the 5th Framework Programme (IST-2000-25324). LIQUID started on January 1st, 2001. Four languages are considered in the project: French, German, Spanish, and English.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computer-aided analysis of multilingual patent documentation

This paper deals with the processing of a multilingual corpus of technical texts. The aim is to extract special purpose terminology. A semi-automatic tool is developed to help professional translators and terminologists not only to identify technical terms but also to detect possible translation equivalences and typical contexts of terms. Definitions of terminology reported in the literature ar...

متن کامل

TTC TermSuite - A UIMA Application for Multilingual Terminology Extraction from Comparable Corpora

This paper aims at presenting TTC TermSuite: a tool suite for multilingual terminology extraction from comparable corpora. This tool suite offers a userfriendly graphical interface for designing UIMA-based tool chains whose components (i) form a functional architecture, (ii) manage 7 languages of 5 different families, (iii) support standardized file formats, (iv) extract singleand multiword ter...

متن کامل

TExtractor: a multilingual terminology extraction tool

This demonstration presents a tool (TExtractor) employed for enriching terminology sets in four languages: English, French, German and Spanish. We present the associated linguistic resources and the experimental results obtained in the medical domain. TExtractor has been developed within project LIQUID (IST-2000-25324), which aims at developing a cost-effective solution for the problem of cross...

متن کامل

Bilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora

This paper presents several methods for exploiting multiple resources in bilingual lexicon extraction, either from parallel or comparable corpora. First, a special attention is given to the use of multilingual thesauri, and different search strategies based on such thesauri are investigated. Then, a method to optimally combine the different resources for bilingual lexicon extraction is presente...

متن کامل

Multilingual Ontologies and English- Bulgarian Ontology Development

In this paper we make a short survey of the approaches for development of multilingual ontologies. Our main goal is to find appropriate approach for development of multilingual ontologies, including Bulgarian language terminology. We propose a collaborative methodology for development of English-Bulgarian bilingual ontologies by usage of information extraction from e-learning textual content, l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002